Efficient Inference in Markov Control Problems

نویسندگان

Thomas Furmston

David Barber

چکیده

Markov control algorithms that perform smooth, non-greedy updates of the policy have been shown to be very general and versatile, with policy gradient and Expectation Maximisation algorithms being particularly popular. For these algorithms, marginal inference of the reward weighted trajectory distribution is required to perform policy updates. We discuss a new exact inference algorithm for these marginals in the finite horizon case that is more efficient than the standard approach based on classical forwardbackward recursions. We also provide a principled extension to infinite horizon Markov Decision Problems that explicitly accounts for an infinite horizon. This extension provides a novel algorithm for both policy gradients and Expectation Maximisation in infinite horizon problems. 1 MARKOV DECISION PROBLEMS A Markov Decision Problem (MDP) is described by an initial state distribution p1(s1), transition distributions p(st+1|st, at) and reward function Rt(st, at), where the state and action at time t are denoted by st and at respectively (Sutton and Barto, 1998). The state and action spaces can be either discrete or continuous. For a discount factor γ the reward is defined as Rt(st, at) = γR(st, at) for a stationary reward R(st, at), where γ ∈ [0, 1). We assume a stationary policy, π, defined as a set of conditional distributions over the action space, πa,s = p(at = a|st = s, π). The total expected reward of the MDP (the policy utility) To avoid cumbersome notation we also use the notation zt = {st, at} to denote a state-action pair. We use the bold typeface, zt, to denote a vector. is given by

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Markov Logic Inference for Natural Language Semantics

Using Markov logic to integrate logical and distributional information in natural-language semantics results in complex inference problems involving long, complicated formulae. Current inference methods for Markov logic are ineffective on such problems. To address this problem, we propose a new inference algorithm based on SampleSearch that computes probabilities of complete formulae rather tha...

متن کامل

Efficient Sampling for Gaussian Process Inference using Control Variables

Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, t...

متن کامل

The Segmented iHMM: A Simple, Efficient Hierarchical Infinite HMM

We propose the segmented iHMM (siHMM), a hierarchical infinite hidden Markov model (iHMM) that supports a simple, efficient inference scheme. The siHMM is well suited to segmentation problems, where the goal is to identify points at which a time series transitions from one relatively stable regime to a new regime. Conventional iHMMs often struggle with such problems, since they have no mechanis...

متن کامل

Stacked Graphical Learning: Learning in Markov Random Fields using Very Short Inhomogeneous Markov Chains

We described stacked graphical learning, a meta-learning scheme in which a base learner is augmented by expanding one instance’s features with predictions on other related instances. The stacked graphical learning is efficient, especially during inference, capable of capturing dependencies easily, and can be constructed based on any kind of base learner. In experiments on two classification pro...

متن کامل

Global optimization using the asymptotically independent Markov sampling method

In this paper, we introduce a new efficient stochastic simulation method, AIMS-OPT, for approximating the set of globally optimal solutions when solving optimization problems such as optimal performance-based design problems. This method is based on Asymptotically Independent Markov Sampling (AIMS), a recently developed advanced simulation scheme originally proposed for Bayesian inference. This...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Efficient Inference in Markov Control Problems

نویسندگان

چکیده

منابع مشابه

Efficient Markov Logic Inference for Natural Language Semantics

Efficient Sampling for Gaussian Process Inference using Control Variables

The Segmented iHMM: A Simple, Efficient Hierarchical Infinite HMM

Stacked Graphical Learning: Learning in Markov Random Fields using Very Short Inhomogeneous Markov Chains

Global optimization using the asymptotically independent Markov sampling method

عنوان ژورنال:

اشتراک گذاری